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COMPLETE OPTIMIZATION OF MODEL PARAMETERS IN 
PARAMETRIC SPEECH CODERS 



BACKGROUND 

The present invention relates generally to speech encoding, and more 
particularly, to an encoder and a gradient search algorithm. 

Speech compression is a well known technology for encoding speech 
into digital data for transmission to a receiver which then reproduces the 
speech. The digitally encoded speech data can also be stored in a variety of 
digital media between encoding and later decoding (i.e., reproduction) of the 
speech. 

Speech synthesis systems differ from other analog and digital encoding 
systems that directly sample an acoustic sound at high bit rates and transmit 
the raw sampled data to the receiver. Direct sampling systems usually 
produce a high quality reproduction of the original acoustic sound and is 
typically preferred when quality reproduction is especially important. Common 
examples where direct sampling systems are usually used include music 
phonographs and cassette tapes (analog) and music compact discs and 
DVDs (digital). One disadvantage of direct sampling systems, however, is the 
large bandwidth required for transmission of the data and the large memory 
required for storage of the data. Thus, for example, in a typical encoding 
system which transmits raw speech data sampled from an original acoustic 
sound, a data rate as high as 96,000 bits per second is often required. 

In contrast, speech synthesis systems use a mathematical model of 
human speech production. The fundamental techniques of speech modeling 
are known in the art and are described in B.S. Atal and Suzanne L. Hanauer, 
Speech Analysis and Synthesis by Linear Prediction of the Speech Wave, 
The Journal of the Acoustical Society of America 637-55 (vol. 50 1971). The 
model of human speech production used in speech synthesis systems is 
usually referred to as a source-filter model. Generally, this model includes an 
excitation signal that represents air flow produced by the vocal folds, and a 
synthesis filter that represents the vocal tract (i.e., the glottis, mouth, tongue, 



nasai cavities and lips). Therefore, the excitation signal acts as an input 
signal to the synthesis filter similar to the way the vocal folds produce air flow 
to the vocal tract. The synthesis filter then alters the excitation signal to 
represent the way the vocal tract manipulates the air flow from the vocal folds. 
Thus, the resulting synthesized speech signal becomes an approximate 
representation of the original speech. 

One advantage of speech synthesis systems is that the bandwidth 
needed to transmit a digitized form of the original speech can be greatly 
reduced compared to direct sampling systems. Thus, by comparison, 
whereas direct sampling systems transmit raw acoustic data to describe the 
original sound, speech synthesis systems transmit only a limited amount of 
control data needed to recreate the mathematical speech model. As a result, 
a typical speech synthesis system can reduce the bandwidth needed to 
transmit speech to about 4,800 bits per second. 

One problem with speech synthesis systems however is that the quality 
of the reproduced speech is sometimes relatively poor compared to direct 
sampling systems. Most speech synthesis systems provide sufficient quality 
for the receiver to accurately perceive the content of the original speech. 
However, in some speech synthesis systems, the reproduced speech is not 
transparent. That is, while the receiver can understand the words originally 
spoken, the quality of the speech may be poor or annoying. Thus, a speech 
synthesis system that provides a more accurate speech production model is 
desirable. 

One solution that has been recognized for improving the quality of 
speech synthesis systems is described in U.S. Pat. Appl. 09/800,071 to 
Lashkari et al., hereby incorporated by reference. Briefly stated, this solution 
involves minimizing a synthesis error between an original speech sample and 
a synthesized speech sample. One difficulty that was discovered in that 
speech synthesis system however is the highly nonlinear nature of the 
synthesis error, which made the problem mathematically intractable. This 
difficulty was overcome by solving the problem using the roots of the 
synthesis filter polynomial instead of the coefficients of the polynomial. 



Accordingly, a root searching algorithm is described therein for finding the 
roots of the synthesis filter polynomial. 

In parametric speech coders that resolve the synthesis filter polynomial 
using roots instead of coefficients, the effectiveness and efficiency of the root 
searching algorithm used has an impact on the quality and performance of the 
speech coder. One root searching algorithm that may be used in such 
speech coders is a gradient search algorithm. As those in the art well know, 
gradient search algorithms use an iterative solution process that calculates a 
gradient vector for a function and estimates the unknown variables using the 
calculated gradient vector. However, improved gradient search algorithms 
are desired for use in parametric speech coders. 

BRIEF SUMMARY 

Accordingly, an improved gradient search algorithm is provided. The 
new, improved algorithm recalculates the gradient vector by taking into 
account the variations of the decomposition coefficients with respect to the 
roots. Thus, the gradient search algorithm is especially useful with linear 
predictive coding speech systems that optimize synthesized speech by 
searching for roots of a polynomial. 

BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS 

The invention, including its construction and method of operation, is 
illustrated more or less diagrammatically in the drawings, in which: 

Figure 1 is a block diagram of a speech analysis-by-synthesis system; 

Figure 2A is a flow chart of the proposed speech synthesis system; 

Figure 2B is a flow chart of an alternative speech synthesis system; 

Figure 3 is a flow chart of a gradient search algorithm; 

Figure 4 is a timeline-amplitude chart, comparing an original speech 
sample to an LPC synthesized speech and an optimally synthesized speech; 

Figure 5 is a chart, showing synthesis error reduction and improvement 
as a result of the optimization; and 



Figure 6 is a spectral chart, comparing an original speech sample to an 
LPC synthesized speech and an optimally synthesized speech. 

DESCRIPTION 

Referring now to the drawings, and particularly to Figure 1 , a speech 
synthesis system is provided that minimizes the synthesis error in order to 
more accurately model the original speech. In Figure 1, a speech analysis- 
by-synthesis ("AbS") system is shown which is commonly referred to as a 
source-filter model. As is well known in the art, source-filter models are 
designed to mathematically model human speech production. Typically, the 
model assumes that the human sound-producing mechanisms that produce 
speech remain fixed, or unchanged, during successive short time intervals 
(e.g., 20 to 30 ms). The model further assumes that the human sound 
producing mechanisms can change between successive intervals. The 
physical mechanisms modeled by this system include air pressure variations 
generated by the vocal folds, glottis, mouth, tongue, nasal cavities and lips. 
Therefore, by limiting the digitally encoded data to a small set of control data 
for each interval, the speech decoder can reproduce the model and recreate 
the original speech. Thus, raw sampled data of the original speech is not 
transmitted from the encoder to the decoder. As a result, the digitally 
encoded data which is transmitted or stored (i.e., the bandwidth, or the 
number of bits) is much less than required by typical direct sampling systems. 

Accordingly, Figure 1 shows an original digitized speech 10 delivered 
to an excitation module 12. The excitation module 12 then analyzes each 
sample s(n) of the original speech and generates an excitation function u(n). 
The excitation function u(n) is typically a series of pulse signals that represent 
air bursts from the lungs which are released by the vocal folds to the vocal 
tract. Depending on the nature of the original speech sample s(n), the 
excitation function u(n) may be either a voiced 1 3, 14 or an unvoiced signal 
15. 

One way to improve the quality of reproduced speech in speech 
synthesis systems involves improving the accuracy of the voiced excitation 



function u(n). Traditionally, the excitation function u(n) has been treated as a 
series of pulses 13 with a fixed magnitude G and period P between the pitch 
pulses. As those in the art well know, the magnitude G and period P may 
vary between successive intervals. In contrast to the traditional fixed 
magnitude M and period P, it has previously been shown to the art that 
speech synthesis can be improved by optimizing the excitation function u(n) 
by varying the magnitude and pitch period of the excitation pulses 14. This 
improvement is described in Bishnu S. Atal and Joel R. Remde, A New Model 
ofLPC Excitation For Producing Natural-Sounding Speech At Low Bit Rates, 
IEEE International Conference On Acoustics, Speech, And Signal Processing 
614-17 (1982). This optimization technique usually requires more intensive 
computing to encode the original speech s(n), but this problem has not been a 
significant disadvantage since modern computers provide sufficient computing 
power for optimization 14 of the excitation function u(n). A greater problem 
with this improvement has been the additional bandwidth that is required to 
transmit data for the variable excitation pulses 14. One solution to this 
problem is a coding system that is described in Manfred R. Schroeder and 
Bishnu S. Atal, Code-Excited Linear Prediction (CELP): High-Quality Speech 
At Very Low Bit Rates, IEEE International Conference On Acoustics, Speech, 
And Signal Processing 937-40 (1985). This solution involves categorizing a 
number of optimized excitation functions into a library of functions, or a 
codebook. The encoding excitation module 12 will then select an optimized 
excitation function from the codebook that produces a synthesized speech 
that most closely matches the original speech s(n). Then, a code that 
identifies the optimum codebook entry is transmitted to the decoder. When 
the decoder receives the transmitted code, the decoder then accesses a 
corresponding codebook to reproduce the selected optimal excitation function 
u(n). 

The excitation module 12 can also generate an unvoiced 15 excitation 
function u(n). An unvoiced 15 excitation function u(n) is used when the 
speaker's vocal folds are open and turbulent air flow is produced through the 
vocal tract. Most excitation modules 12 model this state by generating an 



excitation function u(n) consisting of white noise 15 (i.e., a random signal) 
instead of pulses. 

Next, the synthesis filter 16 models the vocal tract and its effect on the 
air flow from the vocal folds. Typically, the synthesis filter 1 6 uses a 
polynomial equation to represent the various shapes of the vocal tract. This 
technique can be visualized by imagining a multiple section hollow tube with a 
number of different diameters along the length of the tube. Accordingly, the 
synthesis filter 16 alters the characteristics of the excitation function u(n) 
similar to the way the vocal tract alters the air flow from the vocal folds, or in 
other words, like a variable diameter hollow tube alters inflowing air. 

According to Atal and Remde, supra., the synthesis filter 16 can be 
represented by the mathematical formula: 



where G is a gain term representing the loudness of the voice. A(z) is a 
polynomial of order M and can be represented by the formula: 



The order of the polynomial A(z) can vary depending on the particular 
application, but a 10th order polynomial is commonly used with an 8 kHz 
sampling rate. The relationship of the synthesized speech (n) to the 
excitation function u(n) as determined by the synthesis filter 16 can be defined 
by the formula: 



H(z) = G/A(z) 



(1) 



M 



A(z) = 1 + £a k z- k 



(2) 



k=1 



M 



s(n) = Gu(n)-£a k s(n-k) 



(3) 



Conventionally, the coefficients a-, . . . a M of this polynomial are 
computed using a technique known in the art as linear predictive coding 



("LPC"). LPC-based techniques compute the polynomial coefficients a-i . . . a M 
by minimizing the total prediction error E p . Accordingly, the sample prediction 
error e p (n) is defined by the formula: 



M 



e p (n) = s(n) + £a k s(n-k) 



(4) 



k = 1 



The total prediction error E p is then defined by the formula: 



N-1 

Ep = 2») 



(5) 



n=0 



where N is the length of the analysis window in number of samples. The 
polynomial coefficients a 1 . . . a M can now be computed by minimizing the total 
prediction error E p using well known mathematical techniques. 

One problem with the LPC technique of computing the polynomial 
coefficients ai . . . a M is that only the prediction error is minimized. Thus, the 
LPC technique does not minimize the error between the original speech s(n) 
and the synthesized speech (n). Accordingly, the sample synthesis error 
e s (n) can be defined by the formula: 

e s (n) = s(n)-s(n) ( 6 ) 
The total synthesis error E s can then be defined by the formula: 



where N is the length of the analysis window. Like the total prediction error E p 
discussed above, the total synthesis error E s should be minimized to compute 
the optimum filter coefficients . . . a M . However, one difficulty with this 



N-1 N-1 



E s = E e s 2 (n)=£(s(n)-s(n)) 2 



(7) 
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technique is that the synthesized speech (n) as represented in formula (3) 
makes the total synthesis error E s a highly nonlinear function that is generally 
mathematically intractable. 

One solution to this mathematical difficulty is to minimize the total 
synthesis error E s using the roots of the polynomial A(z) instead of the 
coefficients a<\ . . . a M . Using roots instead of coefficients for optimization also 
provides control over the stability of the synthesis filter 1 6. Accordingly, 
assuming that h(n) is the impulse response of the synthesis filter 16, the 
synthesized speech (n) is now defined by the formula: 

s(n) = h(n)*u(n) = 2]h(k)u(n-k) (8) 

k=0 



W where * is the convolution operator. In this formula, it is also assumed that 

yi the excitation function u(n) is zero outside of the interval 0 to N-1 . Using the 

riJI 5 roots of A(z), the polynomial can now be expressed by the formula: 

S 

q A(z) = (1-V- 1 ) (1-^ M z -1 ) (9) 

■FS :: 

O where 7* ...Xm represents the roots of the polynomial A(z). These roots may 

H20 be either real or complex. Thus, in the preferred 10th order polynomial, A(z) 

will have 10 different roots. 

Using parallel decomposition, the synthesis filter function H(z) is now 

represented in terms of the roots by the formula: 

M 

25 H(z) = 1/A(z)=£b/(1-V~ 1 ) (10) 



(the gain term G is omitted from this and the remaining formulas for 
simplicity). The decomposition coefficients bj are then calculated by the 
residue method for polynomials, thus providing the formula: 
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M 

b,= n 

j=i, h 



(11) 



The impulse response h(n) can also be represented in terms of the roots by 
the formula: 



h(n)=2 b M> n (12 ) 

i = 1 

Next, by combining formula (12) with formula (8), the synthesized 
speech (n) can be expressed by the formula: 



n n M 

s(n) = £ n ( k ) u ( n " k) - Z u < n - k ) Z b . W ) k (13) 

k = 0 k=0 i = 1 



Therefore, by substituting formula (13) into formula (7), the total synthesis 
error E s can be minimized using polynomial roots and a gradient search 
algorithm. 

A number of root searching algorithms may be used to minimize the 
total synthesis error E s . One possible algorithm, however, is an iterative 
gradient search algorithm. Accordingly, denoting the root vector at the j-th 
iteration as the root vector can be expressed by the formula: 

^M^-.^.-VT (14) 

where A® is the value of the i-th root at the j-th iteration and T is the 
transpose operator. The search algorithm begins with the LPC solution as the 
starting point, which is expressed by the formula: 

^MV 0) ...V 0> ...VT (is) 
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To compute A (0) , the LPC coefficients a-i . . . a M are converted to the 
corresponding roots A, (0) . . . A M (0) using a standard root finding algorithm. 

Next, the roots at subsequent iterations can be expressed by the 
formula: 

A*" = A® + pV J E s (16) 

where \i is the step size and VjE s is the gradient of the synthesis error E s 
relative to the roots at iteraton j. The step size jx can be either fixed for each 
iteration, or alternatively, it can be variable and adapted for each iteration. 
Using formula (7), the synthesis error gradient vector VjE s can now be 
calculated by the formula: 



V j E s = g(s(k)-s(k))V J s(k) (17 ) 

k = 0 

Formula (17) demonstrates that the synthesis error gradient vector VjE s 
can be calculated using the gradient vector of the synthesized speech 
samples (k). Accordingly, the synthesized speech gradient vector Vj (k) can 
be defined by the formula: 



V J s(k) = [3s(k)/^ a) . . . ds(k)/9i r 0) . . . 5s(k)/aA M 0) ] (1 8) 

where 3s(k)/d/L r 0) is the partial derivative of (k) at iteration j with respect to the 
r-th root. Using formula (13), the partial derivative ds(k)fd^ } can be 
calculated by the formula: 



k m r i 

as(k)/ai r = X £u(k-m)4b,A m j/a2 r (19) 

m = 0 i = 1 
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(the superscript j is omitted from formula (19) through formula (28) for 
notational simplicity). Formula (19) can now be expressed using the chain 
rule of differentiation by the formula: 



[biXi m ]/ %r = xr b/ X r +mb i ^ r (m - 1) 5(N) 



(20) 



where 5(r-i) is the delta function (i.e., S(r-i)=1 for r=i and 6(r-i)=0 for r i). 

To resolve formula (20) 5 the partial derivative db\/dl r must be 
calculated. Therefore, formula (1 1) can be substituted into the partial 
derivative db\/dX r to provide the formula: 



To resolve the partial derivative of formula (21), the partial derivative must be 
calculated for two cases, including r i and r=i. 

In the first case of formula (20), where r i, only one multiplicative term 
of 1/(1 -A, r V 1 ), which corresponds to j=r, depends on A, r . Therefore, the partial 
derivative of formula (21) can be expressed by the formula: 



Next, the partial derivative of 1/(1 -Jt r V 1 ) can be calculated by the formula: 



(21) 



in. W-W 1 )]}/ V= { I! [1/(1 -W)]/ h (r i) (22a) 




[1/(1 -Ui" 1 )]/ %r = !Ah-Kf 



(22b) 



By substituting formula (22b) into formula (22a) and simplifying, formula (22a) 
can be expressed by the formula: 
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{f[ [1/(1-XjV 1 )]}/ Xr=bi/(V^r) (r i) (22c) 

j=1,j*i 

By substituting formula (22c) into formula (21 ) and further simplifying, the 
partial derivative of BbJdX x for the case of r i can now be expressed by the 
formula: 

b/ A r =(bA l )[1/(1-Ui" 1 )] (r i) (22d) 

In the second case of formula (21) where r=i, all of the M-1 multiplicative 
terms of 1/(1 -X{k{ 1 ) depend on X,. Therefore, the partial derivative of formula 
(21) can be calculated as the sum of the M-1 contributions to the partial 
derivative. Thus, using the q-th multiplicative term (i.e., 1/(1-^V 1 )), the 
contribution to the partial derivative due to this term alone can be expressed 
by the formula: 

{ I! W-W 1 )]} H/fl-W 1 )]/ (r=i) (23a) 

J=1.pi,j*q 

Next, the partial derivative of 1/(1-XqV 1 ) can be calculated by the formula: 

[1/(1 -A q V 1 )]/ A,- = -y(VA. q ) 2 (23b) 

By substituting formula (23b) into formula (23a) and simplifying, formula (23a) 
can be expressed by the formula: 

{ n n/o-w 1 )]} n/d-^qv 1 )]/ A i =bA i (i-XiV 1 ) (23c) 

j=1 ,J*i,j*q 

Using formula (23c) to add up all of the contributions in formula (23a) and 
then substituting the result into formula (21) and further simplifying, the partial 
derivative of dbjdl r for the case of r=i can now be expressed by the formula: 
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j=M 

b/X r =(b i a i )X [1/(1 -W 1 )] (r=0 (23d) 

j=1,j*l 

In order to unify the two cases of r i and r=i, the function K(i,r) can be 
defined by the following formulas: 

K(i,r) = 1/(1-Uf 1 ) (ifr i) (24a) 

M 

K(i,r)= £ Ll/(1-Uj- 1 )] (ifr=i) (24b) 

j=1J*i 

The partial derivative of dbJdX T can now be simplified for both cases by the 
formula: 

b/ Xr= biK(i,r)/Xs (for any r) (25) 

By substituting formula (25) into formula (20), the partial derivative of 
[bfo m ]/ A,- can now be expressed by the formula: 

[bi9ii m ]/ A, = b i [K(i,r)X i (m - 1) + rW m - 1 )S(r-i)] (26) 

In formula (26), the first term of the formula (i.e., K(i,r)^ (m - 1) ) is the contribution 
of bj/ h while the second term of the formula (i.e., mA, r (m " 1) 5(r-i)) is the 
contribution of the m-th power of X\. 

By substituting formula (26) into formula (19), the partial derivative of 
the k-th sample of the synthesized speech with respect to the r-th root can be 
expressed by the formula: 

(k)/^=£ u(k-m)£ bi[K(i,r)^ m - 1 > + mJw< m - 1 >5(r-i)] (27) 

m=0 i=1 

By simplifying formula (27), the partial derivative can be expressed by the 
formula: 
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k i=M k 

(k)/ A, = £ u(k-m)£ biK(i,r)Xi (m - 1) + b r J; mu(k-mK (m ' 1) (28) 

m=0 i=1 m =1 

For completeness, the iteration index j can be inserted back into formula (28) 
to express the partial derivative of the synthesized speech at iteration j by the 
formula: 

k i=M k 

(k)/ u(k-m)2 b|K(i,r)(^ m - 1 > + b r ^ mu(k-m)(^) {m - 1 ) (k 0) (29) 

m=0 i=1 m =i 

The synthesis error gradient vector VjE s is now calculated by 
substituting formula (29) into formula (18) and formula (18) into formula (17). 
The subsequent root vector J 0+1) at the next iteration can then be calculated 
by substituting the result of formula (17) into formula (16). The iterations of 
the gradient search algorithm are then repeated until either the synthesis error 
E s is reduced by a desired percentage from the LPC prediction error E p , a 
predetermined number of iterations are completed, or the roots are resolved 
within a predetermined acceptable range. 

Although control data for the optimal synthesis polynomial A(z) can be 
transmitted in a number of different formats, it is preferable to convert the 
roots found by the optimization technique described above back into 
polynomial coefficients a 1 . . . a M . The conversion can be performed by well 
known mathematical techniques. This conversion allows the optimized 
synthesis polynomial A(z) to be transmitted in the same format as existing 
speech coders, thus promoting compatibility with current standards. 

Now that the synthesis model has been completely determined, the 
control data for the model is quantized into digital data for transmission or 
storage. Many different industry standards exist for quantization. However, in 
one example, the control data that is quantized includes ten synthesis filter 
coefficients . . . a 10 , one gain value G for the magnitude of the excitation 
function pulses, one pitch period value P for the frequency of the excitation 
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function pulses, and one indicator for a voiced 13 or unvoiced 15 excitation 
function u(n). As is apparent, this example does not include an optimized 
excitation pulse 14, which could be included with some additional control data. 
Accordingly, the described example requires the transmission of thirteen 
distinct variables at the end of each speech frame. Commonly, the thirteen 
variables are quantized into a total of 80 bits. Thus, according to this 
example, the synthesized speech (n), including optimization, can be 
transmitted within a bandwidth of 4,000 bits/s (80 bits/frame + .020 s/frame). 

As shown in Figure 1 , the order of operations can be changed 
depending on the accuracy desired and the computing capacity available. 
Thus, in the embodiment described above, the excitation function u(n) was 
first determined to be a preset series of pulses 13 for voiced speech or an 
unvoiced signal 15. Second, the synthesis filter polynomial A(z) was 
determined using conventional techniques, such as the LPC method. Third, 
the synthesis polynomial A(z) was optimized. 

In Figures 2A and 2B, different encoding sequences are shown which 
should provide more accurate synthesis and may be used with CELP-type 
speech encoders. However, some additional computing power will typically 
be required. In these sequences, the original digitized speech sample 30 is 
used to compute 32 the polynomial coefficients a-i . . . a M using the LPC 
technique described above or another comparable method. The polynomial 
coefficients a^ . . . a M , are then used to find 36 the optimum excitation function 
u(n) from a codebook. Alternatively, an individual excitation function u(n) can 
be found 40 from the codebook for each iteration. After selection of the 
excitation function u(n), the polynomial coefficients a-, . . . a M are then also 
optimized. To make optimization of the coefficients a<\ . . . a M easier, the 
polynomial coefficients a 1 . . . a M are first converted 34 to the roots of the 
polynomial A(z). A gradient search algorithm is then used to optimize 38, 42, 
44 the roots. Once the optimal roots are found, the roots are then 
converted 46 back to polynomial coefficients a^ . . . a M for compatibility with 
existing encoding-decoding systems. Lastly, the synthesis model and the 
index to the codebook entry is quantized 48 for transmission or storage. 

-15- 
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Additional encoding sequences are also possible for improving the 
accuracy of the synthesis model or for changing the computing capacity 
needed to encode the synthesis model. Some of these alternative sequences 
are demonstrated in Figure 1 by dashed routing lines. For example, the 
excitation function u(n) can be reoptimized at various stages during encoding 
of the synthesis model. 

In Figure 3, a flow chart of the gradient search algorithm is shown. 
After the polynominal coefficients ai . . . a m have been converted to roots 34, 
first roots of the polynominal are computed 50. The initial roots may be 
determined by several methods, including root finding algorithms such as 
Newton-Raphson or interval halving. Decomposition coefficients bj are then 
calculated using the first computed roots 52. Next, the gradient vector of the 
polynominal is calculated using the contribution of the decomposition 
coefficients bj 54. Once the gradient vector is calculated for the first computed 
roots, the gradient vector is used to calculate second estimated roots 56. A 
test is then performed to determine whether the search should end or whether 
it should continue 58. Several tests may be used, including testing whether 
the LPC prediction error E p has been reduced by a desired percentage, 
whether a limited number of iterations has been completed, or whether the 
estimated roots are within an acceptable range. If the search is determined to 
be complete, the gradient search algorithm stops and the estimated roots are 
passed on to the speech synthesis system for further processing 58. On the 
other hand, if the search is not determined to be complete, the decomposition 
coefficients bj are recalculated using the second estimated roots 52. The 
process of calculating the gradient vector and re-estimating the roots is then 
repeated using the new contribution of the recalculated decomposition 
coefficients bj 54, 56. 

The improvement of the gradient search algorithm is now apparent. In 
gradient search algorithms used in other speech synthesis systems, such as 
the system described in U.S. Pat. Appl. No. 09/800,071 to Lashkari et al., the 
decomposition coefficients are assumed to be constant during successive 
iterations of the gradient search. While this assumption provides acceptable 
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results for some applications, improved results are achieved by the gradient 
search algorithm because variations in the decomposition coefficients that 
occur during successive iterations are considered when calculating the 
gradient vector. 

Figures 4-6, show the improved results provided by the optimized 
speech synthesis system. The figures show several different comparisons 
between a prior art LPC synthesis system and the optimized synthesis 
system. The speech sample used for this comparison is a segment of a 
voiced part of the nasal "m". In Figure 4, a timeline-amplitude chart of the 
original speech, a prior art LPC synthesized speech and the optimized 
synthesized speech is shown. As can be seen, the optimally synthesized 
speech matches the original speech much closer than the LPC synthesized 
speech. 

In Figure 5, the reduction in the synthesis error is shown for successive 
iterations of optimization. At the first iteration, the synthesis error equals the 
LPC synthesis error since the LPC coefficients serve as the starting point for 
the optimization. Thus, the improvement in the synthesis error is zero at the 
first iteration. Accordingly, the synthesis error steadily decreases with each 
iteration. Noticeably, the synthesis error increases (and the improvement 
decreases) at iteration number three. This characteristic occurs when the root 
searching algorithm overshoots the optimal roots. After overshooting the 
optimal roots, the search algorithm can be expected to take the overshoot into 
account in successive iterations, thereby resulting in further reductions in the 
synthesis error. In the example shown, the synthesis error can be seen to be 
reduced by 59% after six iterations. Thus, a significant improvement over the 
LPC synthesis error is possible with the optimization. 

Figure 6 shows a spectral chart of the original speech, the LPC 
synthesized speech and the optimized synthesized speech. As seen in this 
chart, the spectrum of the optimized speech provides a much better match to 
the spectrum of the original speech as compared to the LPC spectrum. The 
improvement in the synthesized spectrum is especially apparent in the 
frequency range of 0 to 1 ,500 Hz. 
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While preferred embodiments of the invention have been described, it 
should be understood that the invention is not so limited, and modifications 
may be made without departing from the invention. The scope of the 
invention is defined by the appended claims, and all devices that come within 
the meaning of the claims, either literally or by equivalence, are intended to be 
embraced therein. 
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